Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similar(::SparseMatrixCSC, dims) returns sparse matrix with empty space (#26560) #30435

Closed
wants to merge 24 commits into from
Closed

similar(::SparseMatrixCSC, dims) returns sparse matrix with empty space (#26560) #30435

wants to merge 24 commits into from

Conversation

KlausC
Copy link
Contributor

@KlausC KlausC commented Dec 18, 2018

As discussed in issue #26560 the reservation of space for rowvaland nzval in the same size as for the origin leads to inaccessible (by the exported API) memory
locations.
The behaviour now is the same for matrices as for sparse vectors in this respect.
It was checked, that in stdlib no use is made of this space (by modifying the internal fields). Actually no similar of sparse matrices or vectors at all is used there.

Copy link
Member

@ViralBShah ViralBShah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a detail that will not be relevant to most users. Let's remove it from NEWS.

@KlausC
Copy link
Contributor Author

KlausC commented Dec 18, 2018

I removed the line in NEWS.md via conflict resolution.

@ViralBShah ViralBShah added the sparse Sparse arrays label Dec 18, 2018
@nalimilan nalimilan requested a review from Sacha0 December 18, 2018 20:41
@ViralBShah
Copy link
Member

ViralBShah commented Dec 19, 2018

I misunderstood this originally. I was under the impression that this would only prevent the allocation of extra memory in the case that the input matrix's rowval and nzval had allocated more memory in these data structures past the end of colptr[end+1].

IIUC, this PR will always allocate zero memory for nzval and rowval (or am I misunderstanding the PR?). I suspect this could actually end up being a major breaking change in the package ecosystem.

@KlausC
Copy link
Contributor Author

KlausC commented Dec 19, 2018

@ViralBShah, yes it could, but only, if the package ecosystem relies on a mis-feature.
As I tried to explain, the "preserved space" can only be used, by accessing the internal fields (nzvaland rowval) directly, but not with any exported API of SparseArrays. That means, for the "good" user the space is unusable. The bad guy, could also do resize! to the internal vectors, if he needed the space.

IMHO maintaining the invariant S.n == length(S.colptr)-1 && S.colptr[end]-1 == length(S.rowval) == length(S.nzval) would be easy and that would improve quality of the type design of SparseMAtrixCSC considerably.

BTW, I did not hear of any example, where the feature is used in an existing package.

@ViralBShah
Copy link
Member

I do know that people like the ability to use the internal representation of the sparse arrays. I agree that this is not a visible/documented API. But the question is, what does a user expect from similar for sparse arrays and what promises should we provide? For example, in dense arrays, you get the storage allocated. In sparse, it is a little more questionable. Should you get the empty container that you populate on your own, or should you be given the same structure?

I think that giving the same nonzero structure (basically colptr being the same as input) is a more reasonable default for sparse similar.

The only way to know if packages depend on this is through pkgeval. Not sure how easy it is to use.

@tkf
Copy link
Member

tkf commented Dec 19, 2018

FWIW, I am another user who were very surprised that similar copied non-zero structure. I was expecting similar to only copy the "logical shape" in this case; i.e., container type, element type, and size.

The only reason why I didn't create a PR myself was that I thought strict application of semver would not allow this change. But if Julia devs are OK with it would be very nice to have this change.

@KlausC
Copy link
Contributor Author

KlausC commented Dec 20, 2018

I think that giving the same nonzero structure (basically colptr being the same as input) is a more reasonable default for sparse similar.

Hi @ViralBShah, I agree with you, this functionality is present, in similarwithout giving new dimensions. If new dimensions are given, the situation is not as easy to implement as I did.

Currently, in the case, there are dimension arguments 2 cases are distinguished:

  • result is SparseVector. In this case the underlying storage consists two empty arrays. That is expected by most users and makes sense to me.
  • result is SparseMatrixCSC. In this case the colptr is filled with all 1 of size S.n.
    So far - so good: it is not a copy of the original but the colptr of an empty matrix! That is consitent.
    Now the array fields rowval and nzvalare allocated with the same size as the corresponding original fields. And that is not consistent with the colptr, which indicates an empty array.

Your suggestion basically colptr being the same as input would also be possible. But what is the non-zero structure if only colptr is populated, but not rowval? It is of no value at all!
So consequently, if we preserve colptr, we should also preserve rowval, as far as required.
If the new n is > the original one, we could extend colptr to length n+1 and fill the trailing cells with the same value as the original end. The rowvalcould be copied.
If the new n is < the original one, we could copy the leading 1:n+1 of colptr and restrict the length of both rowval and nzval to maximum(colptr) - 1.
In both cases with changing n, the non-zero structure would be preserved as far as possible. Depending on the new n we would effectively remove trailing columns or add empty columns.
If mis changing, we are fine when the new m is greater than to old one.
If the new m is smaller than the original one, we need to delete all cells from rowval, which are greater than the new m. That could also reduce size of rowvaland nzval.
If you think, the last proposal has a chance of being accepted, I can implement it as an alternative.
I am not completely happy with this solution, though, because no we preserve the non-zero structure in the 2-dimensional case, but do not in the one-dimensional case. An what, if we change from vector to matrix?

On important drawback of the current situation is, that if you want a similar of a really huge sparse matrix, with much smaller dimensions (for example you want to store in it information derived from regular blocks of the original), you can unnecessarily run out of memory.

IIRC a "breaking change" happens only if the documented behaviour changes. As similar does not mention any assertions about the size of the internal storage, it would be ok to change the size without notice.

@Sacha0
Copy link
Member

Sacha0 commented Dec 20, 2018

IIRC this behavior is intentional and exploited by e.g. sparse broadcast to avoid repeated allocation; ref.
#26560 (comment). More this weekend if necessary. Best! S

@KlausC KlausC closed this Dec 27, 2018
@KlausC KlausC reopened this Dec 27, 2018
@mbauman
Copy link
Member

mbauman commented Dec 31, 2018

I'm in favor of this change and similar is not directly used in sparse broadcast so that's fine (it does use the same concept, but does so directly on the internal structure).

There are two cases here. Either folks are:

  • Using entirely "public" APIs, which typically don't take advantage of this empty space. In such cases, they'll see a performance improvement.
  • Using the internal structures entirely. In such cases, they're often manipulating the length of these vectors in any case. So either they'll see an error (which we'll have to address via PkgEval during the release process), or they'll already be resizing appropriately.

So I'd say we go whole-hog here and always return empty storage, regardless of the similar call. I think that's easier to reason about, but I'm not a terribly heavy sparse user myself.

stdlib/SparseArrays/src/sparsematrix.jl Outdated Show resolved Hide resolved
stdlib/SparseArrays/src/sparsematrix.jl Outdated Show resolved Hide resolved
stdlib/SparseArrays/test/sparse.jl Outdated Show resolved Hide resolved
stdlib/SparseArrays/test/sparse.jl Outdated Show resolved Hide resolved
stdlib/SparseArrays/src/sparsevector.jl Show resolved Hide resolved
@Sacha0
Copy link
Member

Sacha0 commented Jan 1, 2019

After a bit of memory dredging, the rationale I recall for the present design is roughly the following:

At some point consensus formed that, for sparse/structured/annotated matrices and vectors, similar should generally return the most similar object that it can while respecting: 1) the arguments to the similar call; and 2) requirements imposed by type stability. Relevant discussion is spread across several issues and pull requests; good take off points for design rationale spelunking are (certainly) #11574, JuliaLang/LinearAlgebra.jl#271, #18161, and (possibly) ##23905, #19372, and #17507.

For example, for D::Diagonal{Float64,Vector{Float64}}, similar(D) yields a Diagonal{Float64,Vector{Float64}}, similar(D, Float32) yields a Diagonal{Float32,Vector{Float32}}, and similar(D, (3, 4)) yields a SparseMatrixCSC{Float64,Int} (as the object allowing the most similar behavior to Diagonal but of arbitrary shape).

(Fun related aside: Part of the eventual consensus was that similar without a shape argument need not return an object with all entries mutable, whereas similar with a shape argument does; this distinction/flexibility proved quite useful.)

Another example is that for U::UpperTriangular{Float64,Diagonal{Float64,Vector{Float64}}}, similar(U) yields a UpperTriangular{Float64,Diagonal{Float64,Vector{Float64}}}, similar(U, Float32) yields a UpperTriangular{Float32,Diagonal{Float32,Vector{Float32}}}, and similar(U, (4, 3)) yields a SparseMatrixCSC{Float64,Int}.

That is, similar preserves everything it reasonably can, including container types, annotations, element types, structure, form of backing storage, et cetera. In this the sparse similar methods are consistent with similar for other sparse/structured/annotated matrices and vectors.

This information-preserving behavior is useful, for example in generic programming: When you send these special AbstractArray subtypes through a generic function, for the most part you get results that preserve as much information about the original objects as possible, and as a side effect both the generic function and downstream operations exploit that information insofar as possible. This happy behavior happens in part thanks to this design for similar.

Absent this design, on a coarse level you end up with generic operations that, e.g., when called on Diagonal return Matrix instead of the Diagonal that you might expect, and perhaps require O(N^2) or O(N^3) work instead of the O(N) you might expect.

On a finer level, and concerning the particular case at hand, you might end up with e.g. more allocations being necessary than you might expect. For example, suppose that some method one way or another does

C = similar(A[, eltype[, shape]])
[...]
[broadcast!|transpose!|copyto!]([op, ]C, A)

The broadcast!/transpose!/copyto! need not be allocating, and the present design avoids unnecessary allocation where possible. (In fact, the present design often yields precisely the correct amount of storage in a single allocation for such calls.) The same behavior isn't necessarily readily achievable without similar having its present design. (IIRC, this is the manner in which similar's behavior impacts e.g. sparse broadcast, mentioned above and in the preceding thread.)

On the other hand, if you want an empty sparse matrix with the same eltype and/or shape as an existing sparse matrix, you can express that explicitly. I'm not sure that we should throw out the benefits of the present similar behavior for sparse matrices/vectors, and reduce its consistency with similar's behavior for other special matrix types, for the sake of having another way to express "give me an empty sparse matrix of given eltype and shape". Best! S

@KlausC
Copy link
Contributor Author

KlausC commented Jan 1, 2019

I would like to contradict @Sacha0 with all necessary respect, whence I am also a fan of consistency in design and API.

  1. the current implementation of similar(A, d1, d2) allocates memory for arrays rowvalandnzvalboth of sizennz(A)`, the proposed change allocates these arrays with size zero.
  2. advantage of the current implementation: if the user actually needs space of exactly that size, he has not to re-allocate memory. Compared to the other case he saves in the baset case 2 allocations with length 0.
  3. advantage of the proposed implementation: if the user actually needs less space, he has not to resize the arrays in order to get rid of the unused memory. Compared to the other solution he saves 2 allocations of maximum size in the best case.
  4. allocating memory of the same size can exceed the virtual memory available. In this case the application get an out-of-memory exception, while it could succeed otherwise.
  5. there may be other ways of creating a sparse matrix with same element types {Tv,Ti} and given dimensions and zero content. Actually I don't see a way as easy as similar(A, dims...).
  6. the user does not typically have control about when and how similaris used in the background, so he has not the choice of using an alternative. Examples: sum(A, dims=2) or mapreduce in general.
  7. if the number of target dimensions is one (result is a SparseVector), similar does not behave this way, but always returns arrays of length 0. This inconsistency is resolved by the proposal.
  8. I cannot see any value in having max(length(A.rowval), length(A.nzval)) > A.m * A.n
  9. The "invisible" extra memory might help security exploits to be not so easy to detect
  10. there is a negative performance impact when the result of similar(A, dims...) is accessed by setindex! for the first time after creation (see following log). I think, the time factor of 10^5 !! is a good argument.
julia> n = 10^4
10000

julia> Random.seed!(0); A = sprandn(n, n, 10^6/n^2); nnz(A)
1000598

# Old version:
julia> @benchmark B = similar($A, 10^4, 1)
BenchmarkTools.Trial: 
  memory estimate:  15.27 MiB
  allocs estimate:  6
  --------------
  minimum time:     817.000 ns (0.00% GC)
  median time:      93.000 μs (99.05% GC)
  mean time:        75.998 μs (98.73% GC)
  maximum time:     62.572 ms (99.99% GC)
  --------------
  samples:          10000
  evals/sample:     1
julia> @benchmark setindex!(B,1.0,4,1) setup=(B = similar($A, 10^4,1 ))
BenchmarkTools.Trial: 
  memory estimate:  2.54 MiB
  allocs estimate:  0
  --------------
  minimum time:     1.207 ms (3.27% GC)
  median time:      1.678 ms (0.00% GC)
  mean time:        1.579 ms (2.53% GC)
  maximum time:     11.460 ms (89.44% GC)
  --------------
  samples:          462
  evals/sample:     6


### Proposed version
julia> @btime B = similar($A, 10^4, 1)
  92.351 ns (4 allocations: 304 bytes)
10000×1 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> @benchmark B = similar($A, 10^4, 1)
BenchmarkTools.Trial: 
  memory estimate:  304 bytes
  allocs estimate:  4
  --------------
  minimum time:     88.860 ns (0.00% GC)
  median time:      97.306 ns (0.00% GC)
  mean time:        148.385 ns (32.39% GC)
  maximum time:     56.889 μs (99.76% GC)
  --------------
  samples:          10000
  evals/sample:     965

julia> @benchmark setindex!(B,1.0,4,1) setup=(B = similar($A, 10^4,1 ))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.116 ns (0.00% GC)
  median time:      9.302 ns (0.00% GC)
  mean time:        9.627 ns (0.00% GC)
  maximum time:     79.004 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

@Sacha0
Copy link
Member

Sacha0 commented Jan 1, 2019

First, happy new year everyone! 🎉 All the best to you and yours in the coming year! :)

Regarding

  1. advantage of the current implementation: if the user actually needs space of exactly that size, he has not to re-allocate memory. Compared to the other case he saves in the baset case 2 allocations with length 0.

and

  1. advantage of the proposed implementation: if the user actually needs less space, he has not to resize the arrays in order to get rid of the unused memory. Compared to the other solution he saves 2 allocations of maximum size in the best case.

It might be worth noting that, regrettably, additional wrinkles exist. For example, given that we tend to avoid storing numerical zeros (and especially previously unstored numerical zeros), you do not always know how much storage you will need at the outset of an operation; you may have to complete the entire operation before you know. That complicates best and worst case analysis of allocation behavior.

Take sparse broadcast again as a prime example, which (in the cases it was designed for) allocates only if it runs out of storage in the destination: You do not know how much storage e.g. sin.(A) might require, and typically the best guess is the storage of the argument(s); with that guess, e.g. sin.(A) ends up in good shape without reallocation (you typically end up with either the right amount of storage or only marginally more than you strictly need). If you do have to reallocate, which you would always have to do at the outset of the operation with the destination being empty, your options are roughly to: 1) reallocate pessimistically but only once and safely (which regrettably the present implementation does as a stopgap in some cases it wasn't designed for, and clearly isn't the happiest behavior); or 2) reallocate optimistically, at risk of having to reallocate repeatedly, and almost certainly allocating more space (by roughly your growth factor) than you will ultimately need. So the tradeoffs here are regrettably not merely two additional empty allocations one way or another.

Regarding

if the number of target dimensions is one (result is a SparseVector), similar does not behave this way, but always returns arrays of length 0. This inconsistency is resolved by the proposal.

Yes, this unfortunate inconsistency was forced by the present design of SparseVector. Specifically, whereas SparseMatrixCSC decouples buffer length and number of stored entries, SparseVector couples those two things, forcing tradeoffs in various operations and particularly this discrepancy between similar for SparseVectors and SparseMatrixCSC / related types. Several of us have long had in mind making SparseVector more flexible in this regard and simultaneously largely unifying it with SparseMatrixCSC; maybe one day :).

Regarding

  1. I cannot see any value in having max(length(A.rowval), length(A.nzval)) > A.m * A.n

Agreed! :) We should at least add a guard capping the length of the buffers appropriately.

Regarding

  1. there is a negative performance impact when the result of similar(A, dims...) is accessed by setindex! for the first time after creation (see following log). I think, the time factor of 10^5 !! is a good argument.

I'm not sure that first-touch performance via setindex! on a sparse matrix is a particularly meaningful or useful thing to optimize for. One can of course come up with tailored benchmarks breaking in favor of either implementation. And generally, concerning performance of sparse matrix operations, using setindex! and friends and not pre-allocating your buffers, you've already largely lost the battle.

Regarding

  1. there may be other ways of creating a sparse matrix with same element types {Tv,Ti} and given dimensions and zero content. Actually I don't see a way as easy as similar(A, dims...).

Nicer ways to spell "create an empty sparse matrix" may indeed be in order. Again that's something of an orthogonal issue, and doesn't seem like the best motivation for a breaking change that removes useful functionality that cannot be otherwise readily achieved.

Best!

@abraunst
Copy link
Contributor

abraunst commented Jan 2, 2019

First, happy new year everyone! All the best to you and yours in the coming year! :)

All the best to everyone :-)

Take sparse broadcast again as a prime example, which (in the cases it was designed for) allocates only if it runs out of storage in the destination: You do not know how much storage e.g. sin.(A) might require, and typically the best guess is the storage of the argument(s); with that guess, e.g. sin.(A) ends up in good shape without reallocation (you typically end up with either the right amount of storage or only marginally more than you strictly need). If you do have to reallocate, which you would always have to do at the outset of the operation with the destination being empty, your options are roughly to: 1) reallocate pessimistically but only once and safely (which regrettably the present implementation does as a stopgap in some cases it wasn't designed for, and clearly isn't the happiest behavior); or 2) reallocate optimistically, at risk of having to reallocate repeatedly, and almost certainly allocating more space (by roughly your growth factor) than you will ultimately need. So the tradeoffs here are regrettably not merely two additional empty allocations one way or another.

Wouldn't it suffice to sizehint! the nzval and rowval vectors, in order to perform the allocation, but keep their nominal size at zero? In this way we would have both coherence and the optimization gain due to preallocation. If it's too difficult to put it in the calling code (e.g. broadcast), the sizehint! call could be just left in similar. In this way the two problems seem to me more or less orthogonal to me...

@KlausC
Copy link
Contributor Author

KlausC commented Jan 2, 2019

@Sacha0, Happy new year again! All the best to you and yours in the coming year! :)

you do not always know how much storage you will need at the outset of an operation; you may have to complete the entire operation before you know. That complicates best and worst case analysis of allocation behavior.

I agree with that statement. Although it does not contradict my statements. I only say, that in one case we allocate a lot of memory - and it may be wrong, in the other case we allocate zero memory - which may be wrong as well. My argument is, that the expected cost is much higher in the first case.

Take sparse broadcast again as a prime example, ...

I think, that is not a use case for similar, because sparse broadcast! doesn't use similar at all, but the private method _allocres to allocate memory.

The broadcast!/transpose!/copyto! need not be allocating, and the present design avoids unnecessary allocation where possible.

Also copyto! makes no use of the pre-allocated buffers, but at first resizes those buffers.
transpose does not use similar. transpose! indeed breaks when called like this:

 julia> transpose!(similar(A, reverse(size(A))), A)
ERROR: ArgumentError: the length of destination argument `X`'s `rowval` array, `length(X.rowval) (= 0)`, must be greater than or equal to source argument `A`'s allocated entry count, `nnz(A) (= 1000698)`

But why should somebody prefer that over the simpler transpose(A) ?

or 2) reallocate optimistically, at risk of having to reallocate repeatedly, and almost certainly allocating more space (by roughly your growth factor) than you will ultimately need. So the tradeoffs here are regrettably not merely two additional empty allocations one way or another.

The two additional empty allocations are correct in the case, which I refer to, when after B = similar(A, dim1, dim2) calls resize!(B.rowval, nnz); resize!(B.nzval, nnz), because he is sure, that he really needs approximately nnz(A) storage space.

SparseMatrixCSC decouples buffer length and number of stored entries,

IMHO that is not a good idea. BTW this decoupling could also be done less visible to the API user by using sizehint!on the arrays. I was asking for examples for the benefits of the oversized buffers, but was not shown a concrete example up to now.
My main argument is not, that pre-allocation of memory is bad in general. But it is bad to allocate memory, and then do not use it! This habit is fostered by the current implementation.

... is a particularly meaningful or useful thing to optimize for.

Of course not! It was just the simplest example that came into my mind, when I tried to demonstrate the performance burden involved by the big buffer, which is moved around. You could argue, that setindex! is suboptimal: requiring a single storage space, it reallocates 10^8 + 1 integers and floats, and does not dare to touch the 10^8 unused cells, but that is another story.

using setindex! and friends and not pre-allocating your buffers, you've already largely lost the battle.

I don't propagate this, in contrary. Actually I am very concerned about the bad performance of linear algebra operations of wrapped sparse matrix operands. That is a consequence of using the "abstract array fallback" AKA getindex while ignoring the sparsity structure. I invested quite some time in several PRs improving this situation. I could even improve the performance of sparse mat-mat multiplication by providing a estimation of the fill factor of the result. So please don't think I am naive with respect to buffer allocation or sparse access questions.

One can of course come up with tailored benchmarks breaking in favor of either implementation.
a breaking change that removes useful functionality that

Again, I could be easily convinced by one concrete example, which demonstrates the usefulness of the current implementation of similar and "cannot be otherwise readily achieved". Also a benchmark where the changed implementation is essentially worse that the current would help me.

Best regards,
Klaus

@KlausC
Copy link
Contributor Author

KlausC commented Jan 2, 2019

Just another example:

julia> n = 10^4;
julia> Random.seed!(0); A = sprandn(n, n, 10^6/n^2); nnz(A)
1000598
julia> b = sprand(n, 0.5);

# BEFORE:
julia> @btime B[:] = $b setup=(B = similar($A, $n, 1));
  12.014 ms (3 allocations: 15.27 MiB)
# AFTER:
julia> @btime B[:] = $b setup=(B = similar($A, $n, 1));
  691.300 μs (25 allocations: 257.05 KiB)

Here the time factor is only 17. Look also at memory allocation - while the count increases as predicted the total amount is much less (again no use is made of the existing buffer in the BEFORE case).

@abraunst
Copy link
Contributor

abraunst commented Jan 8, 2019

What is the status of this PR? It would be good to have this in.

@ViralBShah
Copy link
Member

As the discussion suggests, we do not have consensus here. I'm going to suggest leaving this open. @abraunst Is there a particular reason you would like it merged?

The benchmark above is not convincing because the intended use case of similar was to allocate the memory and hence users that don't want it that way simply end up avoiding it.

@abraunst
Copy link
Contributor

abraunst commented Jan 8, 2019

As the discussion suggests, we do not have consensus here. I'm going to suggest leaving this open. @abraunst Is there a particular reason you would like it merged?

The benchmark above is not convincing because the intended use case of similar was to allocate the memory and hence users that don't want it that way simply end up avoiding it.

I think that the current situation is bad (in particular having length(A.nzval) != nnz(A) for similar with dimension arguments). The reason for having it seems to have gone at least (given that sparse broadcast does not use similar). And at the very least, we could keep the allocation but preserve length(A.nzval) == nnz(A) by using sizehint! if performance really demands it...

@KlausC
Copy link
Contributor Author

KlausC commented Jan 9, 2019

the intended use case of similar was to allocate the memory and hence users that don't want it that way simply end up avoiding it.

The "intended use case" is not documented, though. So the user, aware of the docs, is maybe not expecting this.

Of course, the user can avoid using similar. Instead of similar(A, m, n)he simply can write spzeros(eltype(A), m, n) if he does not mind Ti being Int. Or he can write his own similar_without_unwanted_memory, behaving as he wants ;-).

@ViralBShah
Copy link
Member

Yes, I should clarify, it was just my intention. I agree that at the very least it should be documented clearly.

@KlausC
Copy link
Contributor Author

KlausC commented Jan 18, 2019

There seems to be consensus on approved JuliaSparse/SparseArrays.jl#44, which partially supersedes this PR. So I close this one.

@KlausC KlausC closed this Jan 18, 2019
@Sacha0
Copy link
Member

Sacha0 commented Jan 21, 2019

Hi all! Though this issue was closed in the interim, I finally have some time to respond and thought recording some thoughts might be useful for posterity:

Regarding

I think, that is not a use case for similar, because sparse broadcast! doesn't use similar at all, but the private method _allocres to allocate memory.

and

I was asking for examples for the benefits of the oversized buffers, but was not shown a concrete example up to now.

and

Again, I could be easily convinced by one concrete example, which demonstrates the usefulness of the current implementation of similar and "cannot be otherwise readily achieved".

and

The reason for having it seems to have gone at least (given that sparse broadcast does not use similar).

Please reread #30435 (comment) (particularly the last two paragraphs) and #30435 (comment). As I explain there, it's not in the implementation of sparse broadcast that similar's behavior is important; rather, it's in use of similar in coordination with sparse broadcast!/transpose!/copyto! and similar such mutating operations that similar's behavior is important.

Also copyto! makes no use of the pre-allocated buffers, but at first resizes those buffers.

Note that those resize!s will be no-ops in the relevant cases thanks to the present behavior of similar, but must allocate otherwise.

But why should somebody prefer that [(e.g. transpose!(C, A))] over the simpler transpose(A) ?

When working with preallocated storage, to avoid unnecessary allocation and memory overhead?

SparseMatrixCSC decouples buffer length and number of stored entries,

IMHO that is not a good idea. BTW this decoupling could also be done less visible to the API user by using sizehint!on the arrays.

Decoupling of buffer length and number of stored entries is a well established property of compressed sparse data structures, and is regularly exploited in sparse linear algebra libraries. I will say more on this front in the relevant new thread later, but for now here's some previous discussion on the topic: #20464 (comment) .

Wouldn't it suffice to sizehint! the nzval and rowval vectors, in order to perform the allocation, but keep their nominal size at zero?

The trouble we previously ran into with this approach is that the actual buffer sizes are then not exposed, and exposure of the buffer sizes to mutating methods is what allows those methods to avoid making guesses about what the size of the result should be. (Ref. e.g. the first couple paragraphs of
#30435 (comment) for a sense of what I mean.)

So please don't think I am naive with respect to buffer allocation or sparse access questions. [...] I invested quite some time in several PRs improving this situation. I could even improve the performance of sparse mat-mat multiplication by providing a estimation of the fill factor of the result.

I both very much appreciate and am well aware of your work on things sparse, having reviewed much of that work :). Please consider reading comments charitably by default.

Best!

@ViralBShah
Copy link
Member

I would like to reinforce the comment about being extremely appreciative of the contributions.

@abraunst
Copy link
Contributor

Sorry @Sacha0, I hope you forgive me for testing your patience. I may be bold, but I think I'm missing the point somehow.

Wouldn't it suffice to sizehint! the nzval and rowval vectors, in order to perform the allocation, but keep their nominal size at zero?

The trouble we previously ran into with this approach is that the actual buffer sizes are then not exposed, and exposure of the buffer sizes to mutating methods is what allows those methods to avoid making guesses about what the size of the result should be. (Ref. e.g. the first couple paragraphs of
#30435 (comment) for a sense of what I mean.)

In the first couple of paragraphs in #30435 (comment) the example is sin.(A), but the similar method with no shape argument would be used there (which in any case fullly preserves non-zero structure besides storage space). In any case, sizehint! would have solved the problem AFAICS.

Could you give an example in which sizehint! does not suffice?

I think that this issue has potentially a large performance impact (in any way it is resolved). So maybe the best would be to propose concrete benchmarks (to add to the benchmark set) so we can test each of the approaches. What do you think? Otherwise, this issue will keep coming up IMHO...

@KlausC
Copy link
Contributor Author

KlausC commented Jan 24, 2019

it's in use of similar in coordination with sparse broadcast!/transpose!/copyto! and similar such mutating operations that similar's behavior is important.

Actually only transpose!(similar(A, n, m), A) relies on similar to provide sufficient buffer size. The other two methods resize the buffers if not sufficient. My approach would be to change ftranspose! to do the re-size instead of throwing as well.

Note that those resize!s will be no-ops in the relevant cases thanks to the present behavior of similar, but must allocate otherwise.

That is true, but only once! That means, if similar allocated vectors of zero length, the first resize within broadcast!, transpose!, copyto! would do the expensive big allocation, while the previous zero allocation would be in vain. In other cases, when the required size is smaller than what similardelivers, the big allocation is for nothing. But as I meant in the begin of this thread, the expectation value for the computational effort might be lower, when we consider a representative sample of use cases.

@KlausC
Copy link
Contributor Author

KlausC commented Jan 31, 2019

The spirit of the is completely covered by JuliaSparse/SparseArrays.jl#44.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants